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Abstract 

This  paper  presents  a  new  approach  to 
sensor  management  of  distributed  sensor 
networks  (DSNs).  Given  the  current  pro¬ 
liferation  of  remote  sensors  and  their  in¬ 
herent  resource  constraints,  DSN  managers 
face  a  growing  problem  of  managing  the 
tradeoff  between  DSN  performance  and  re¬ 
source  consumption.  Our  model,  the  Sensor 
Network  Optimal  OPerations  Simulator,  or 
SNOOPS,  addresses  this  tradeoff  by  identi¬ 
fying  a  DSN  control  strategy  that  reaches 
an  acceptably  certain  representation  of  the 
search  region  while  minimizing  operating 
costs. 

The  core  of  the  SNOOPS  model  is  an 
approximate  dynamic  programming  (ADP) 
process  that  uses  simulation-based  policy  it¬ 
eration  to  identify  an  efficient  DSN  con¬ 
trol  strategy.  Results  indicate  that  the 
SNOOPS-recommended  DSN  control  strat¬ 
egy  improves  the  efficiency  of  DSN  opera¬ 
tions  by  up  to  47  percent  over  the  Base  Pol¬ 
icy  of  activating  all  sensors. 

In  addition  to  determining  efficient  DSN 
control  strategies,  our  model  also  provides 


a  research  base  to  (1)  investigate  the  fu¬ 
sion  of  observations  from  disparate  sensors, 
(2)  demonstrate  the  use  of  non-imaging  sen¬ 
sors  to  provide  adequate  situational  aware¬ 
ness  where  precision  emplacement  of  more- 
capable  sensors  is  not  possible,  and  (3)  de¬ 
velop  operational  concepts  to  integrate  DSN 
operations  with  user  needs. 

i 

Introduction 

Traditional  sensor^  networks  have  been  in 
use  for  years  for  military  and  civilian  surveil¬ 
lance  apphcations,  as  well  as  for  electronic, 
economic,  medical,  and  hazard  detection 
systems.  These  traditional  sensor  networks 
used  relatively  simple  sensors  that  were  con¬ 
nected  to  permanent  infrastructure,  with 
virtually  unlimited  power  and  communica¬ 
tions  resources.  Without  strict  resource  con¬ 
straints,  the  DSN  sensor  management  prob¬ 
lem  was  trivial  -  keep  all  the  sensors  ac¬ 
tive  all  the  time,  continuously  sampling  the 
environment  and  reporting  observations,  re¬ 
ferred  to  in  this  paper  as  the  Base  Policy. 
Use  of  the  Base  Policy  led  to  a  greater  em¬ 
phasis  on  sensor  fusion  than  on  sensor  man¬ 
agement. 
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Recently,  however,  the  types  of  sensors 
used  in  these  sensor  networks  have  begun 
to  evolve,  primarily  due  to  advances  in  in¬ 
tegrated  circuit,  computing,  and  commu¬ 
nications  technologies  (Pottie  and  Kaiser 
2000).  These  technological  advances  have 
resulted  in  the  ability  to  build  advanced 
micro-electromechanical  systems  (MEMS), 
which  have  in  turn  enabled  the  development 
of  more  capable,  remote  sensors  with  a  wire¬ 
less  commimication  capability.  These  re¬ 
mote  sensors,  commonly  referred  to  as  unat¬ 
tended  ground  sensors  (UGS),  consist  of  a 
variety  of  sensor  technologies  that  are  pack¬ 
aged  for  deployment  and  perform  the  mis¬ 
sion  of  remote  target  detection,  location, 
and/or  recognition  (Srour  1999), 

Networks  of  intelligent,  disparate  sensors 
(like  UGS)  that  are  distributed  spatially  and 
geographically  are  generally  referred  to  as 
Distributed  Sensor  Networks,  or  DSNs.  The 
incorporation  of  UGS  has  vastly  expanded 
the  capabilities  of  DSNs  versus  traditional 
sensor  networks  by  facilitating  rapid  deploy¬ 
ment  and  reconfiguration  in  largely  uncon¬ 
strained  arrangements  (Clare  et  al.  1999). 

However,  these  same  UGS  have  also  intro¬ 
duced  new  battery  power  and  communica¬ 
tions  bandwidth  constraints.  While  signif¬ 
icant  accomplishments  have  been  achieved 
in  developing  sensor-level  approaches  to  ad¬ 
dress  these  constraints,  there  still  exists  a 
need  for  system-level  management  of  indi¬ 
vidual  DSN  assets. 

In  the  remainder  of  this  paper,  we  formally 
define  the  DSN  sensor  management  prob¬ 
lem,  identify  the  Test  Bed  Scenario  used  in 
our  research,  briefly  review  related  DSN  sen¬ 
sor  management  research,  describe  our  gen¬ 
eral  approach  and  methodology  for  execut¬ 
ing  this  approach,  identify  our  modeling  ap¬ 
proach,  define  the  mathematical  formulation 


for  our  solution  technique,  present  some  in¬ 
teresting  results,  and  offer  a  plan  for  contin¬ 
ued  research. 

Problem  Definition 

In  its  simplest  form,  DSN  sensor  manage¬ 
ment  is  fundamentally  a  d3mamic  resource- 
allocation  problem  (Malhotra  1995),  where 
we  must  balance  DSN  performance  (mea¬ 
sured  in  terms  of  target  detection  and  local¬ 
ization)  against  resource  consumption  (mea¬ 
sured  in  terms  of  power  usage).  Solving  this 
problem  necessitates  developing  a  sequential 
decision  process  for  providing  control  over  a 
sensor  network  where  the  penalties  and  re¬ 
wards  for  actions  are  only  revealed  over  time. 

The  sequential  aspect  of  this  process  lends 
itself  to  interpretation  as  a  closed-loop  feed¬ 
back  control  system,  as  described  by  Bert- 
sekas  (2000)  in  Figure  1. 


Figure  1:  Closed-Loop  Feedback  Control 
System 

In  this  figure,  u  depicts  the  Input,  w  de¬ 
picts  the  random  Disturbance,  S  is  the  Sys¬ 
tem  Function,  y  depicts  the  Output,  and  tt  is 
the  Feedback  Controller.  The  term  u  =  7r(y) 
indicates  that  each  Input  is  a  function  of  the 
previous  Output. 
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We  caai  easily  interpret  the  DSN  sensor 
management  problem  in  terms  of  a  closed- 
loop  feedback  control  system.  Initial  control 
instructions  direct  specific  sensors  to  take 
measxirements  of  the  environment.  These 
measurements,  or  observations,  are  trans¬ 
lated  into  sensor  reports,  which  are  fused 
using  a  Bayesian  approach  to  provide  an  up¬ 
dated  representation  of  the  search  region. 
This  updated  representation  is  then  used  by 
the  Feedback  Controller  to  determine  the 
next  iteration  of  sensor  control  instructions. 

Using  this  interpretation,  our  objective 
in  addressing  the  DSN  sensor  management 
problem  is  to  develop  a  strategy  for  the  Feed¬ 
back  Controller  that  allows  us  to  detect  and 
locate  any  objects  of  interest  in  the  search 
region  while  conserving  scarce  resources.  In 
other  words,  we  need  to  develop  a  model 
that  individually  tasks  DSN  sensing  assets 
in  such  a  manner  as  to  reach  an  acceptably 
certain  representation  of  the  search  region 
while  minimizing  DSN  operating  costs,  mea¬ 
sured  in  terms  of  power  usage. 

Test  Bed  Scenario 

To  provide  a  specific  context  for  the  DSN 
sensor  management  problem,  we  created  a 
Test  Bed  Scenario,  described  below. 

Search  Region.  The  search  region  for 
the  Test  Bed  Scenario  consisted  of  the  three- 
kilometer  by  three-kilometer  terrain  box  de¬ 
picted  in  Figure  2. 

Objective.  The  objective  for  the  Test 
Bed  Scenario  consisted  of  using  the  DSN  to 
detect  and  locate  any  objects  of  interest  lo¬ 
cated  within  the  search  region  while  mini¬ 
mizing  DSN  operating  costs  in  order  to  ex¬ 
tend  the  life  of  the  DSN. 

Sensor  Network.  The  DSN  for  the  Test 


Figure  2:  Terrain  Box 


Bed  Scenario  was  fairly  dense,  consisting 
of  35  individual  sensing  nodes  to  cover  the 
search  region.  We  used  only  one  sensor  type 
to  simplify  the  sensor  fusion  problem,  select¬ 
ing  acoustic  sensors  since  they  axe  the  most 
common  non-imaging  sensors  in  use  today 
(Hopkins  et  al.  2000).  The  acoustic  sensors 
modelled  in  our  scenario  consisted  of  a  cir¬ 
cular  array  of  microphones  that  axe  capable 
of  providing  a  line  of  bearing  estimate  to  a 
detected  object  of  interest. 

We  modelled  the  sensors  with  a  “self- 
locating”  capability,  a  realistic  expectation 
as  there  axe  currently  several  research  ef¬ 
forts  ongoing  to  enable  self-localization  of  a 
network  of  sensors  (e.g.,  see  (Moses  et  al. 
2002)).  To  simplify  the  problem,  we  chose  to 
establish  a  fixed  cluster  topology,  with  the 
35  sensors  organized  into  four  logical  clus¬ 
ters,  with  from  6  to  12  sensors  in  each  clus¬ 
ter,  as  shown  in  Figure  3. 

In  this  configuration,  each  sensor  is  subordi¬ 
nate  to  a  cluster  head,  which  is  in  turn  sub¬ 
ordinate  to  the  sensor  network  controller. 

We  modelled  the  sensors  with  onboard 
power  management  tools  that  would  per¬ 
mit  two  operating  modes.  In  the  “Sleep” 
mode,  a  sensor  operates  at  the  lowest  pos- 
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Figure  3:  Sensor  Location  and  Cluster 
Topology 

sible  power  consumption  level  and  is  just 
awaiting  instructions  to  begin  sensing  the 
environment.  Once  awakened,  the  sensor  op¬ 
erates  in  an  “Active”  mode.  In  this  mode, 
the  sensor  actively  observes  the  environ¬ 
ment,  expending  higher  amormts  of  energy. 
If  the  sensor  detects  an  object  of  interest, 
whether  a  true  detection  or  false  alarm,  it 
will  then  expend  an  additional  arriount  of 
energy  to  transmit  a  sensor  report. 

Related  Work 

In  recent  years,  there  has  been  an  increased 
level  of  effort  focused  on  developing  method¬ 
ologies  that  individually  task  DSN  sensing 
assets.  Three  of  the  more  promising  DSN 
sensor  management  approaches  being  de¬ 
veloped  include  Bayes  Risk  minimization, 
information-theory,  and  dynamic  program¬ 
ming  (DP). 

Bayes  Risk  Minimization.  In  this  ap¬ 
proach,  the  objective  is  to  apply  available 
sensing  assets  so  as  to  minimize  the  condi¬ 
tional  expectation  of  the  Bayes  average  risk 
with  respect  to  a  pre-defined  loss  functional. 
In  effect,  sensors  are  selected  to  reduce  the 
risk  of  making  an  incorrect  detection  or  lo¬ 


calization  decision. 

In  Sinno  et  al.  and  Cochran  et  al.  (1999), 
the  search  region  is  divided  into  a  number  of 
disjoint  cells  and  a  hypothesis  is  established 
for  each  cell  that  a  target  is  present  in  that 
particular  cell.  Bayes  Rule  is  applied  to  the 
prior  probabilities  of  the  hypotheses  and  the 
sensors’  probabilities  of  detection  and  false 
alarm  to  calculate  the  posterior  probabili¬ 
ties  of  each  hypothesis,  given  a  particular 
test  and  its  outcome.  Using  these  posterior 
probabilities,  the  conditional  expectation  of 
the  Bayes  average  risk  is  determined  before 
any  test  is  actually  run  and  the  action  result¬ 
ing  in  the  minimum  expected  Bayes  Risk  is 
selected. 

While  this  sort  of  a  closed-loop  feedback 
control  policy  is  useful  for  its  stated  purpose, 
it  fails  to  address  two  of  our  critical  DSN 
sensor  management  requirements:  reducing 
imcertainty  in  the  search  region  representa¬ 
tion  and  minimizing  resource  consumption. 

Information  Theory.  In  this  approach, 
the  objective  is  to  apply  available  sensing 
assets  so  as  to  reduce  the  uncertainty  in 
the  state  and  hence  produce  a  quantifiable 
amount  of  information.  McIntyre  (1996) 
postulates  that  sensor  observations  produce 
potential  information  gains  that  may  reduce 
(i)  imcertainty  of  location  of  undetected  tar¬ 
gets,  (ii)  uncertainty  associated  with  the  es¬ 
timate  of  a  target’s  state  vector,  or  (iii)  un¬ 
certainly  associated  with  target  identity. 

McInt3Te  (1998)  and  McIntyre  and  Hintz 
(1997)  present  models  that  schedule  sensor 
usage  based  on  Shannon’s  definition  of  en¬ 
tropy  as  a  measure  of  potential  information 
gain.  The  models  use  the  measure  of  poten¬ 
tial  information  gain  to  determine  whether 
to  use  sensor  resources  to  track  targets  al¬ 
ready  in  track  or  to  search  for  new  targets, 
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and  then  to  decide  which  sensor  to  use. 

Kastella  (1997)  and  Schmaedeke  (1993) 
pr^ent  models  that  schedule  sensor  usage 
based  on  the  KuUback-Leibler  discrimina¬ 
tion  information  function  as  a  measure  of  po¬ 
tential  discrimination  gain.  The  models  es¬ 
timate  the  expected  discrimination  gain  for 
the  observation  of  each  sector  of  the  search 
region  and  then  activate  sensors  in  such  a 
way  as  to  maximize  the  expected  discrimi¬ 
nation  gain. 

This  closed-loop  feedback  control  policy 
shows  promise  as  a  DSN  sensor  manage¬ 
ment  policy  since  it  is  focused  on  reducing 
the  uncertainty  in  the  search  region  repre¬ 
sentation.  However,  it  chooses  to  perform 
the  best  action  at  each  particular  sensing 
iteration,  without  regard  for  the  evolution 
of  the  scenario.  This  may  not  be  desirable 
since  one  may  prefer  to  perform  actions  that 
axe  not  optimal  in  the  information-theoretic 
sense  but  are  superior  in  terms  of  mission 
success  (Malhotra  1995). 

Dynamic  Programming.  In  this  ap¬ 
proach,  the  objective  is  to  apply  the  sens¬ 
ing  assets  so  as  to  optimize  DSN  perfor¬ 
mance  within  specified  resource  constraints. 
Castanon  (1995)  appHes  DP  to  the  sensor 
management  problem,  characterizing  it  as  a 
dynamic  sequential  hypothesis  testing  prob¬ 
lem,  with  hypotheses  associated  with  spe¬ 
cific  subregions  of  the  search  region.  His 
model  selects  sensors  to  maximize  the  com¬ 
posite  probability  of  identifying  the  correct 
hypothesis  at  the  end  of  a  fixed  horizon. 

In  subsequent  work,  Castanon  (1997)  uses 
DP  to  specifically  address  the  problem  of 
classifying  a  knovm  number  of  unknown  ob¬ 
jects.  He  again  formulates  the  problem 
as  a  sequential  hypothesis  testing  problem, 
where  the  hypotheses  are  associated  with 


specific  classifications  of  the  objects.  His 
model  finds  a  decision  rule  that  minimizes 
the  expected  total  cost  over  all  admissible 
decision  rules,  subject  to  sensor  use  con¬ 
straints. 

A  DP-based  closed-loop  feedback  control 
policy  appears  to  be  ideally  suited  for  our 
definition  of  DSN  sensor  management.  In 
fact,  Bertsekas  (2000)  claims  that  “DP  is 
the  only  general  approach  for  sequential 
decision-making”  because  of  its  guaranteed 
convergence  to  the  optimal  solution  (given 
the  correct  cost-to-go’s)  and  its  immunity 
to  noise.  Additionally,  Schmaedeke  (1993) 
declares  that  “One  of  the  most  promising 
methods  to  provide  the  sequential  decision 
process  necessary  to  develop  control  instruc¬ 
tions  is  approximate  dynamic  programming 
(ADP).” 

General  Approach 

Our  approach  to  address  the  DSN  sensor 
management  problem  consists  of  developing 
an  ADP-based  closed-loop  feedback  control 
process  that  identifies  an  efficient  DSN  con¬ 
trol  strategy.  The  framework  we  used  to  de¬ 
velop  this  closed-loop  feedback  control  pro¬ 
cess  is  depicted  in  Figure  4. 

In  Step  1  of  this  framework,  we  decide 
whidi  sensors  to  activate  for  the  particulm 
sensing  iteration,  using  an  ADP  construct 
described  below.  In  Step  2,  the  active  sen¬ 
sors  operate,  taking  readings  and  passing  re¬ 
ports  up  the  DSN  hierarchy,  as  defined  in  the 
cluster  topology.  In  Step  3,  we  fuse  the  re¬ 
ports  submitted  by  the  active  sensors  to  up¬ 
date  our  representation  of  the  search  region. 
Finally,  in  Step  4,  we  determine  whether  our 
current  representation  of  the  search  region  is 
adequate  enough  to  terminate  the  DSN  mis¬ 
sion.  If  not,  we  return  to  Step  1  and  repeat 
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ADP  Opftiinizat^ 
(PaifernuuK*  Vs  Cost) 


Ootection  Modal 


Figure  4;  Sensor  Management  Framework 

the  cycle  until  either  an  object  of  interest 
is  located  or  we  are  certain  enough  that  no 
object  is  present. 

The  ADP  optimization  process  balances 
both  present  and  expected  future  costs  while 
working  towards  an  ultimate  objective.  At 
each  stage  of  the  problem,  we  use  Monte 
Carlo  simulation  to  implement  a  one-step 
lookahead  version  of  a  rollout  policy  to  de¬ 
velop  approximations  for  the  transition  costs 
and  cost- to-go  functions  for  a  number  of 
promising  control  actions.  We  then  use  these 
approximations  to  select  the  best  candidate 
action,  execute  the  selected  action,  and  then 
proceed  to  the  next  stage,  continuing  this 
process  until  we  achieve  the  DSN  objective. 

Methodology 

Our  methodology  for  implementing  and 
validating  the  approach  described  above 
consists  of  executing  the  following  tasks: 

•  DSN  Simulation  Model  Development 


•  DP  Formulation 

•  Policy  Comparison  Analysis 

These  three  tasks  are  summarized  below 
and  described  in  detail  in  subsequent  sec¬ 
tions  of  the  paper. 

DSN  Simulation  Model  Develop¬ 
ment.  Since  there  was  no  actual  DSN  avail¬ 
able  with  which  to  experiment,  we  developed 
oiu:  own  DSN  simulation  model.  Our  model, 
the  Sensor  Network  Optimal  Operations 
Simulator,  or  SNOOPS,  serves  two  primary 
purposes.  First,  it  simulates  the  perfor¬ 
mance  of  any  static  or  dynamic  stationary 
sensor  control  policy.  Second,  it  implements 
the  aforementioned  ADP  process  to  identify 
a  “near-optimal”  sensor  control  policy  (here¬ 
after  referred  to  as  the  SNOOPS  Policy). 

DP  Formulation.  We  used  the  DP 
model  for  dynamic  search  suggested  in 
Castahon  (1995,  1997)  and  revisited  in 
Patek  (2001)  as  the  basis  for  the  mathemat¬ 
ical  formulation  of  our  sensor  management 
process.  The  cited  works  assume  that  a  sen¬ 
sor  searches  only  one  cell  at  a  time  and  that 
false  alarms  will  not  occur.  In  our  model,  we 
extend  these  results  in  a  number  of  ways. 
First,  we  consider  searches  where  multiple 
cells  may  be  searched  simultaneously.  Sec¬ 
ond,  we  consider  searches  which  can  produce 
false  alarms.  Finally,  we  extend  the  formu¬ 
lation  from  a  finite  to  an  infinite  horizon. 

Policy  Comparison  Analysis.  Using 
the  SNOOPS  Model,  we  quantified  the  per¬ 
formance  of  the  Base  Policy  and  SNOOPS 
Policy.  We  collected  output  data  that  in¬ 
cluded  the  total  cost  to  reach  termination, 
the  number  of  stages  required  to  reach  termi¬ 
nation,  and  the  amount  of  computation  time 
required  to  reach  termination.  We  also  in¬ 
cluded  the  performance  of  various  other  sim- 
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pie  dynamic  sensor  control  policies  to  pro¬ 
vide  additional  performance  comparisons. 

Modelling  Approach 

The  SNOOPS  DSN  simulation  model 
makes  use  of  three  fundamental  modelling 
components:  a  Detection  Model,  a  Sensor 
Fusion  Model,  and  a  Bayesian  Model. 

Detection  Model.  In  the  SNOOPS 
model,  each  active  sensor  makes  a  binary  de¬ 
cision,  y,  between  two  hypotheses  for  each 
cell  observed:  H\  that  an  object  of  interest 
is  present  in  the  cell  and  Hq  that  an  object 
of  interest  is  not  present  in  the  cell.  The 
likelihood  of  declaring  whether  Hq  or  Hi  is 
true  is  dictated  by  the  sensor’s  performance 
capabilities. 

In  standard  signal  detection  theory,  q 
corresponds  to  the  probability  of  a  “False 
Alarm”  (or  false  positive)  and  /3  corresponds 
to  the  probability  of  a  “Miss”  (or  false  neg¬ 
ative).  Conversely,  the  probability  of  a  “De¬ 
tection”  (or  true  positive)  is  (1  —  0)  and  the 
probability  of  a  “Quiet”  outcome  (or  true 
negative)  is  (1  —  a). 

SNOOPS  uses  the  curves  depicted  in  Fig¬ 
ures  5  and  6  to  represent  sensor  performance 
diaracteristics. 

Figure  5  represents  the  probability  of  detec¬ 
tion  (PD)  curve  used  to  determine  (1  —  0) 
while  Figure  6  represents  the  probability  of 
false  alarm  (PF)  curve  used  to  determine  a. 
Both  of  these  curves  are  a  function  of  sensor- 
target  (S-T)  distance. 

If  the  sensor  detects  an  object  of  interest 
(either  a  true  or  false  positive),  it  submits  a 
sensor  report.  SNOOPS  handles  directional 
and  non-directional  sensors  differently.  Di¬ 
rectional  sensors  (e.g.,  acoustic)  are  able  to 
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Figure  5:  PD  Curve 
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Figure  6:  PF  Curve 

provide  a  bearing  to  the  target,  reported  in 
the  range  (0,  360),  with  360  degrees  indi¬ 
cating  due  North.  The  result  of  a  detection 
will  be  a  “Report  Cone,”  a  region  extend¬ 
ing  out  to  the  sensor’s  maximum  range  and 
centered  on  the  reported  bearing  to  the  tar¬ 
get.  The  width  of  this  cone  is  determined 
by  the  sensor  capabilities.  A  Report  Cone  is 
graphically  depicted  in  Figure  7. 

Non-directional  sensors  (e.g.,  seismic),  on 
the  other  hand,  are  unable  to  provide  a  bear¬ 
ing  to  the  target.  The  resiilt  of  a  detection 
will  be  a  “Report  Disk,”  a  region  centered 
on  the  sensor  location  and  extending  to  its 
maximum  range.  A  Report  Disk  is  graphi¬ 
cally  depicted  in  Figure  8. 
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Figure  7;  Report  Cone 
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Figure  8;  Report  Disk 

Sensor  Fusion  Model.  SNOOPS  uses  a 
distributed  fusion  approach,  with  sensor  fu¬ 
sion  occurring  at  multiple  locations  through¬ 
out  the  DSN.  Each  cluster  head  fuses  the 
input  from  its  subordinate  sensors  and  then 
forwards  a  fused  cluster  report  to  the  DSN 
controller.  The  DSN  controller  than  fuses 
these  inputs  to  develop  a  global  inference. 
This  sharing  of  the  computational  burden 
throughout  the  sensor  network  provides  sev¬ 
eral  benefits,  including  reduced  power  con¬ 
sumption  and  bandwidth. 

While  the  sensor  reports  are  created  in  re¬ 
sponse  to  a  binary  local  decision,  each  of  the 
fusion  centers  in  the  SNOOPS  model  uses 
a  fusion  of  probabilities  scheme  to  fuse  the 


reports.  In  this  scheme,  detection  proba¬ 
bilities  are  used  instead  of  actual  observa¬ 
tion  vectors  or  local  decisions.  These  detec¬ 
tion  probabilities  are  derived  from  the  pre¬ 
viously  described  sensor  performance  curves. 
Krzysztofowicz  and  Long  (1990)  claim  that 
the  fusion  of  probabilities  scheme  offers  high 
performance  and  flexibility  for  distributed 
multi-sensor  systems,  has  modest  require- 
.  ments  for  communications  channels,  and  is 
appropriate  for  situations  where  observa¬ 
tions  naturally  occur  in  the  form  of  detection 
probabilities. 

Bayesian  Model.  The  centerpiece  of 
the  fusion  of  probabilities  fusion  scheme  de¬ 
scribed  above  is  an  implementation  of  Bayes 
Theorem,  which  describes  how  to  develop  in¬ 
ferences  based  on  prior  evidence  and  current 
observations.  In  the  SNOOPS  model,  we  as¬ 
sume  that  there  exists  a  current  belief  about 
the  hypothesis  that  an  object  of  interest  is 
present  in  cell  i  prior  to  receiving  any  sensor 
reports.  This  prior,  or  unconditional,  be- 
hef  about  the  probability  of  cell  i  contain¬ 
ing  a  target  is  and  our  prior  belief 

that  cell  i  does  not  contain  a  target  is  then 
p(Ho)  =  1-p(Hi). 

Consider  as  a  binary  observation  re¬ 
turned  by  a  sensor  after  searching  cell  i,  with 
the  values  of  y  being  either  0  or  1.  The  case 
where  y  =  1  indicates  that  an  object  of  in¬ 
terest  was  detected  in  cell  i,  and  y  =  0  indi¬ 
cates  that  no  object  of  interest  was  detected 
in  cell  i.  The  likelihood  of  observing  y  given 
the  presence  (or  absence)  of  an  object  of  in¬ 
terest  is  then  p{y\Hi)  (or  p{y\Ho)).  These 
detection  probabilities  are  derived  from  the 
appropriate  sensor  performance  curves. 

The  sensor  fusion  problem  becomes:  Given 
y,  determine  the  posterior  probability  of  cell 
i  containing  an  object  of  interest,  or  in  other 
words,  find  p(i?i|y).  In  general,  we  can  use 
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Bayes  Theorem  to  find  p{Hi\y)  as  follows: 

pmy)  =  \p{y\H^)  x  p{H{)]lp{y\ 

with  p{y)  ^  p{y\H{)p{Hi)  +  p{y\HQ)p{Ho). 
For  our  model,  we  have  formulated  specific 
implementations  of  Bayes  Theorem,  which 
we  will  describe  in  detail  in  the  System  Dy¬ 
namics  section  of  this  paper. 

DP  Formulation 

We  begin  the  mathematical  formulation  of 
our  sensor  management  approach  by  estab¬ 
lishing  some  initial  notation.  We  then  de¬ 
fine  our  model  in  terms  of  normal  DP  ele¬ 
ments:  states,  control  actions,  disturbances, 
system  dynamics,  cost  structure,  objective 
function,  and  finally,  our  ADP  solution  ap¬ 
proximation. 

Initial  Notation.  We  introduce  the  fol¬ 
lowing  notation  that  will  be  used  throughout 
the  remainder  of  this  paper  (with  more  de¬ 
tailed  explanations  provided  in  subsequent 
sections): 

Let  E  denote  the  environment  (search  re¬ 
gion)  consisting  of  C  search  locations,  or 
cells.  Each  cell  Cj,  i  =  1, 2, . . . ,  C,  represents 
a  unique  portion  of  the  search  region  and 
«=u£.ic,. 

Associated  with  each  cell  q  there  is  a  prop¬ 
erty,  P*,  defined  as  “cell  i  contains  an  object 
of  interest.” 

Let  denote  the  probability  that  property 
P*  is  true,  given  all  available  information. 
Let  X  =  . . . ,  be  the  vector  of 

current  probabilities  associated  with  each  of 
the  cells  in  the  search  region. 


Let  t  =  0,1, ...  denote  the  stage  (sensing 
iteration)  of  the  DSN  sensor  management 
problem.  In  general,  the  DSN  sensor  man¬ 
agement  process  proceeds  indefinitely  until 
we  reach  an  acceptable  level  of  certainty 
about  the  search  area  representation  (Mal- 
hotra  1995).  For  this  reason,  we  cannot  as¬ 
sume  that  there  is  a  natural  (fixed,  finite) 
time  horizon,  a  priori. 

Let  G  €  5ft  denote  the  number  of  targets  in 
the  search  region. 

Let  S  denote  the  set  of  sensors  available 
to  observe  the  search  region.  Each  sensor 
s  =  1, 2, . . . ,  m  can  observe  a  portion  of  the 
search  region,  the  set  of  cells  E^.  It  is  im¬ 
portant  to  note  that  the  sensor  footprints 
P®,  $  =  1, 2, . . . ,  m  are  not  mutually  exclu¬ 
sive  sets  (i.e.,  each  cell  i  may  exist  in  mul¬ 
tiple  sensor  footprints)  and  are  constant  for 
all  stages  t. 

Let  u  denote  a  test  that  is  applied  to  the 
search  region,  where  u  £  U,  the  set  of  all 
possible  tests.  Each  test  u  corresponds  to  a 
different  subset  of  the  total  available  sensors 
S. 

Let  5“  denote  the  set  of  sensors  that  are 
included  in  test  u.  The  subset  of  cells  ob¬ 
served  during  test  u  will  depend  on  the  sen¬ 
sors  included  in  the  test  and  is  denoted  by 
P“,  where  P“  =  P*. 

Let  denote  the  local  binary  decision 
made  at  sensor  s  after  observing  cell  i.  The 
value  of  this  decision  corresponds  to  a  ten¬ 
tative  decision  concerning  the  presence  of  an 
object  of  interest  in  the  cell.  The  binary  de¬ 
cision  can  take  values  in  {0, 1},  with  proba¬ 
bilities  described  as  follows  for  each  sensor  s 
included  in  5“: 
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=  1  [sensor  s  active)  = 

.  if  pi  ig  not  true  x 

1  _  /ji,«  :  if  pi  is  true 


P{yY  =  Ojsensor  s  active)  = 

1  —  P(y*’*  =  1  [sensor  s  active)  (2) 

When  y*’*  =  1,  we  say  that  cell  i  is  a  mem¬ 
ber  of  the  subset  of  cells  in  that  are 
considered  Detect  cells.  When  =  0,  we 

say  that  cell  i  is  a  member  of  the  subset 
of  cells  in  that  are  considered  Non-Detect 
cells.  The  sets  D®  and  iV®  axe  mutually  ex¬ 
clusive  and  E^  =■  . 

Let  z  denote  the  observation  made  by  the 
current  set  of  active  sensors,  5^,  where  z  G 
Z,  the  observation  space.  The  observation 
z  consists  of  the  vectors  D*  and  iV^,  for 
each  s  E  For  the  example  shown  in 
Figure  7,  where  a  singe  sensor  s  was  acti¬ 
vated,  the  resultant  observation  z  consisted 
of  the  vectors  iV®  =  {set  of  gray  ceUs}  and 
=  {set  of  black  cells}. 

Problem  Types.  We  consider  two  types 
of  DSN  sensor  management  problems.  Type 
I  and  Type  II,  each  related  to  a  different 
assumption  regarding  the  interdependence 
of  the  hypotheses.  In  the  following  defi¬ 
nitions,  we  borrow  the  terminology  of  “ex¬ 
clusive”  and  “independent”  hypotheses  from 
Castanon  (1995). 

Type  I  Problem.  In  this  type  of  problem, 
we  have  a  single  C-ary  hypothesis  testing 
problem,  where  for  each  cell  i  there  exists 
a  hypothesis  ff*  that  property  P*  is  true. 
For  these  problems,  we  assume  that  these 
hypotheses  axe  “exclusive”  (i.e.,  exactly  one 


hypothesis  . . . ,  H^}  is  true).  This 

would  correspond  to  a  scenario  with  exactly 
one  object  of  interest  in  the  search  region. 
For  Type  I  problems,  the  component  values 
of  X  will  necessarily  sum  to  unity  for  each 
stage  of  the  problem. 

Type  II  Problem.  In  this  type  of  problem, 
we  have  C  independent  hypothesis  testing 
problems,  one  for  each  cell  i.  In  eadi  of  these 
hypothesis  testing  problems,  there  exists  the 
hypothesis  W  that  property  P*  is  tme  and 
the  null  hypothesis  that  there  is  nothing  of 
interest  in  cell  i.  For  these  problems,  we  as¬ 
sume  that  the  hypotheses  are  “independent” 
(i.e.,  multiple  hypotheses  {i?°, H^, H^} 
may  be  true).  This  would  correspond  to  a 
scenario  with  multiple  objects  of  interest  in 
the  search  region.  This  is  also  the  most  ap¬ 
propriate  category  for  a  scenario  where  the 
number  of  objects  in  the  search  region  G  is 
unknown  and  will  be  the  most  frequent  prob¬ 
lem  type  encountered.  For  Type  II  prob¬ 
lems,  the  component  values  of  x  wiU  not  sum 
to  unity  in  general. 

States.  In  our  interpretation  of  the  DSN 
sensor  management  problem  our  decisions 
regarding  sensor  control  axe  based  on  the  re¬ 
sults  of  noisy  sensor  observations.  While  we 
can  safely  assume  that  one  sensor’s  obser¬ 
vations  will  not  have  an  impact  on  another 
sensor’s  observations,  it  is  probably  less  safe 
to  assume  that  no  sensor  observations  will 
be  correlated.  For  example,  several  sensors 
will  likely  report  on  the  presence  of  a  tar¬ 
get  (or  loud  natural  phenomenon),  given  it 
is  within  each  of  their  footprints.  While  the 
sensor  observations  may  not  be  quite  inde¬ 
pendent,  we  will  treat  them  as  sudi  for  the 
practical  purposes  of  our  Test  Bed  Scenario. 

Once  received,  these  sensor  observations 
are  stored  in  the  information  vector  /*,  where 
h  =  {zo,zu---,Zt,'tMi,Uu...,Ut-i)  for  t  = 
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0, 1, . . .  and  Jo  =  zq.  The  information  vector  which  measure  the  user’s  degree  of  belief  in 

defines  the  information  available  after  stage  the  presence  of  an  object  of  interest  within 

t,  including  the  sequence  of  tests  conducted  each  cell  before  applying  any  tests.  For  Type 

and  observations  received  through  stage  t.  I  problems,  these  values  are  normalized  so 

that  the  elements  of  Xq  will  sum  to  unity. 

In  our  solution  to  the  DSN  sensor  manage¬ 
ment  problem,  we  are  looking  for  a  rule  that  For  the  Test  Bed  Scenario  simulation  runs, 
teUs  us  the  control  Ut  to  be  applied  for  ev-  SNOOPS  used  the  prior  distribution  graph- 

ery  possible  information  vector  It.  As  new  ically  represented  in  Figure  9  as  the  initial 

observations  are  added  at  each  stage  t,  the  state.  In  this  figure,  the  color  of  each  cell 

dimension  of  the  information  vector  It  in-  represents  the  value  of  p(iJi)  for  that  cell, 

creases  accordingly.  Since  it  is  necessary  to  with  light  colors  corresponding  to  low  prob- 

apply  the  DP  algorithm  over  the  entire  space  abilities  and  dark  colors  corresponding  to 

of  It,  solution  of  this  problem  may  be  very  high  probabilities, 

complex  and  computationally  impossible  for 
large  values  of  t. 

To  simplify  the  problem,  we  need  a  func¬ 
tion  that  is  of  smaller  dimension  than  It,  yet 
summarizes  all  the  essential  content  of  It  as 
far  as  control  is  concerned.  As  shown  in 
Bertsekas  (2000),  a  useful  sufficient  statis¬ 
tic  is  the  conditional  probabiUty  distribu¬ 
tion  of  the  state.  In  our  case,  this  suffi¬ 
cient  statistic  is  represented  by  Xt,  the  con¬ 
ditional  probability  distribution  at  stage  t, 
where  j*  =  . . .  ,xf},  such  that  xj  is 

the  conditional  probability  that  property  P’  Figure  9:  Initial  Probabilities 

is  true,  given  the  information  available  at 

stage  t,  or  P(P*  =  l|Jt).  It  is  easy  to  correlate  the  prior  probabili¬ 

ties  depicted  in  Figure  9  with  the  terrain  box 
The  state  space  X  is  a  convex  set  which  shown  in  Figure  2.  The  highest  probabilities 

represents  the  set  of  feasible  states  for  the  correspond  to  road  and  trail  networks  while 

problem.  For  T3q)e  I  problems,  X  can  be  the  lowest  probabilities  correspond  to  water- 

considered  to  be  the  n-dimensional  unit  sim-  ways  and  other  major  obstacles, 

plex  since  the  values  are  constrained  to 

sum  to  unity,  while  for  Type  II  problems,  X  Termination  States.  A  key  feature  of  our 
is  much  larger  since  the  individual  xl  values  formulation  is  that  the  decision  process  pro- 
are  not  constrained  in  the  same  manner.  ceeds  indefinitely  until  we  reach  certainty 

(or  near  certainty)  about  each  of  the  hy- 
Initial  State.  Our  initial  state  is  the  potheses.  To  capture  termination  within 

prior  distribution,  xq  =  {xo,Xo, . . .  ,x^},  this  framework,  we  assume  that  there  exists 

such  that  Xq  is  the  initial  probability  that  a  space  of  special  termination  regions  fl.y, 

the  property  P*  is  true.  This  prior  dis-  where  j  is  the  tolerance  that  describes  our 

tribution  consists  of  subjective  probabilities  required  degree  of  certainty. 
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Just  as  optimal  solutions  in  linear  pro¬ 
gramming  exist  at  extreme  points  in  the 
feasible  region,  termination  regions  exist 
around  the  extreme  points  or  vertices  of  the 
state  space  in  our  problem.  For  Type  I  prob¬ 
lems,  the  vector  x  =  (0, . . . ,  0, 1, 0, . .  • ,  0)  6 
X  (with  the  1  in  the  i-th  component)  corre¬ 
sponds  to  certainty  about  the  property  P*. 
For  Type  I  problems,  there  will  be  C  sudi 
termination  regions. 

For  Type  II  problems,  the  vector  x*  = 
(0, . . . ,  0, 1, 0, . . . ,  0, 1, 0, . . . ,  0)  €  X  (with  a 

I  in  the  i-th  and  y-th  components)  corre¬ 
sponds  to  certainty  about  the  properties  J” 
and  PK  A  similar  construct  exists  for  Type 

II  problems  where  more  than  two  objects  of 
interest  exist  within  the  search  region.  Fi¬ 
nally,  the  vector  x*  =  (0, . . . ,  0)  €  X  corre¬ 
sponds  to  certainty  that  the  null  hypothesis 
is  true  (i.e.,  there  are  no  objects  of  interest 
in  the  search  region).  For  Type  II  problems, 
there  will  be  2^  such  termination  regions. 

A  termination  state  xn  €  is  described 
as  a  state  that  satisfies  the  requirement  that 
each  x\,i  =  1,...,(7  lie  within  the  range 
(0,7)  or  ((1  —  7),  1).  The  termination  states 
are  absorbing,  and  once  the  system  reaches  a 
termination  state  it  remains  there  at  no  fur¬ 
ther  cost.  We  point  out  that  the  case  where 
7  =  0  is  reasonable  whenever  there  exist  all- 
powerful  tests  that  can  locate  an  object  of 
interest  with  certainty. 

Control  Actions.  At  each  stage  t  of  the 
problem,  we  choose  a  test  Ut^U  that  corre¬ 
sponds  to  a  specific  subset  of  the  total  avail¬ 
able  sensors.  The  control  space  U  consists  of 
all  possible  combinations  of  sensors,  which 
translates  to  a  total  of  2'^  —  1  unique  tests 
from  which  to  dioose,  given  there  are  S  sen¬ 
sors  available. 

Disturbances.  The  random  disturbance 


Wt  is  an  element  of  a  space  Wj,  t  =  0, 1, . . . 
and  is  diaracterized  by  a  probabihty  mea¬ 
sure  defined  on  a  collection  of  events  in  Wt. 
This  probability  measure  may  depend  ex¬ 
plicitly  on  Xt  and  Ut  but  not  on  values  of 
prior  disturbances  lUt-i, These  dis¬ 
turbances  describe  the  stochastic  nature  of 
the  DSN  sensor  management  problem,  as¬ 
pects  of  the  problem  that  axe  not  control¬ 
lable. 

System  Dynamics.  The  system  dynam¬ 
ics  are  described  by  a  transition  function  /*, 
a  function  that  describes  the  evolution  of  the 
system  from  state  x*  to  state  Xt+i-  The  tran¬ 
sition  process  works  as  follows:  at  stage  t  the 
system  is  in  state  Xt,  we  make  a  decision  to 
apply  the  sensors  in  test  u*,  and  the  system 
incurs  a  random  disturbance  Wt,  driving  it 
to  state  Xt+i.  The  system  equation  that  de¬ 
scribes  this  transition  is  as  follows: 


Xt+I  =  /<  (xt,  Ut,  Wt),  t  =  0, 1, . . . .  (3) 

The  function  ft  effects  the  transition  from 
state  Xt  to  state  Xt+i,  where  Xt+i  — 
(xj+i,  Xi^i, . . . ,  Xt+i).  This  state  transition 
is  effected  through  a  Bayesian  fusion  of  the 
prior  state  Xt  with  the  local  decisions 
(and  corresponding  probabihties)  resulting 
from  applying  test  u*.  The  specific  evolution 
of  the  posterior  conditional  probabilities  is 
different  for  Type  I  and  Type  II  problems, 
as  described  below. 

Conditional  Probability  Evolution  -  Type  I 
Problems.  For  Type  I  problems,  measure¬ 
ments  resulting  from  searching  cell  i  will  af¬ 
fect  all  of  the  conditional  probabihties  Xt+i. 
For  this  reason,  the  conditional  probabihty 
evolution  for  Type  I  problems  is  much  more 
complicated  than  for  Type  II  problems. 
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For  each  cell  Ci^Dl  {^.  Detect  Cell),  the  conditional  probability  evolution  is 
given  below; 


i;„=  n  ^  E+E+E  W 

j€Df-  k^Nf  _  L  D  N  L  . 

For  each  cell  q  e  iV/  (a  Non-Detect  Cell),  the  conditional  probability  evolution 
is  given  below: 

4(/3^*)x  n  -  E+E+E  P) 

c^ei??  CkeNi-  J  ID  N  L. 

For  each  cell  Ci  ^  Ef  (an  Unobserved  Cell),  the  conditional  probability  evolu¬ 
tion  is  given  below; 

Cj€l?|  CfcSiV*  J  ID.  JV  L. 


with  Yji  S  defined  as  follows: 

D  N  L 


E  ' 

=  Y. 

-  X  n  ^  n 

(7) 

D 

Ci6D| 

CjSiDl-  Ck&Nf 

E 

=  Y 

(/?»-*)  X  JJ  (c^-*)  X  (1  -  «*=■*) 

(8) 

N 

CiSiVi 

Cj  S  Cfc  €  — 

E  =  E 

4  X  Yi  {oP'^)  X  n 

(9) 

L  CiiB^ 

Cj^Df  CkSN^ 
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Conditional  Probability  Evolution  -  Type 
II  Problems.  For  Type  II  problems,  mesr 
surements  obtained  from  searching  cell  i  will 
only  affect  the  local  conditional  probabihty 
for  those  cells  that  were  observed,  that 
is,  where  cell  iEE^. 

For  each  cell  Ci  e  {&  Detect  Cell),  the 
conditional  probability  evolution  is  given  be¬ 
low; 


(10) 

For  each  cell  Cf  €  (a  Non-Detect  Cell), 
the  conditional  probability  evolution  is  given 
below: 


xj(l  —  /3*'^)  +  (1  —  xj)(Q:*’®) 


(1  -  x|)(l 


major  power  expenditures  related  to  tran¬ 
sition  costs:  sensing  cost  and  transmission 
cost.  The  sensing  cost  is  a  function  of  the 
number  of  sensors  that  are  in  test  Uj  while 
the  transmission  cost  is  only  incurred  if  a 
sensor  that  is  in  test  Ut  submits  a  sensor  re¬ 
port  to  its  assigned  cluster  head. 

Terminal  Cost.  For  our  formulation  of  the 
problem,  we  will  not  impose  terminal  costs 
since  the  iterative  decision  process  continues 
until  we  reach  a  termination  state  within  an 
acceptable  “Gamma  Neighborhood.”  In  ad¬ 
dition,  we  have  not  imposed  any  costs  as  a 
function  of  time  to  completion  (number  of 
stages  required). 

Objective  Function.  Our  objective  is 
to  determine  the  optimal  sequence  of  sensor 
tests  that  will  allow  us  to  reach  one  of  the 
termination  states  with  minimum  expected 
total  cost.  Since  there  is  no  clear  upper 
bound  on  the  number  of  stages  required,  we 
must  plan  over  an  infinite  time  horizon. 


For  each  cell  Ci^  (an  Unobserved  Cell), 
the  conditional  probability  evolution  is  given 
below: 

4+1  =  4  (12) 


For  an  infinite  horizon  problem,  the  usual 
DP  objective  is  defined  as  follows:  given 
an  initial  state  xq,  find  a  policy  tt  = 
{/uo,  ^1, . . .},  where  pkixt)  e  Ut, 

for  all  Xt  €  X,t  —  0,1,. . .,  that  minimizes 
the  cost  function 


Cost  Structure.  In  general  DP  formula¬ 
tions,  the  cost  structure  consists  of  two  por¬ 
tions,  transition  costs  and  a  terminal  cost. 
The  cost  function  is  additive  in  that  cost 
incurred  at  stage  t  accumulates  over  time. 
Since  the  primary  DSN  resource  constraint  is 
related  to  power,  we  established  a  cost  struc¬ 
ture  for  our  DP  formulation  that  is  based  on 
power  consumption. 

Transition  Costs.  For  each  stage  t,  we  exe¬ 
cute  a  test  Ut  and  pay  a  cost  gtixt,  ut,  wt)  >  0 
for  implementing  the  test.'  We  identified  two 


'N-l 

M^o)  =  .jini  '^9t{xt,lH{xt),Wt) 

L  i=0  J 

(13) 

subject  to  the  system  equation  con¬ 
straint  (3),  repeated  below  for  ease  of  ref¬ 
erence. 


xt+i  =  fiixt,  Ut,  Wt),  t  =  0, 1, . . . . 
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Since  termination  appears  to  be  inevitable 
for  all  reasonable  heuristic  policies  tt  used 
in  our  ADP  methodology,  we  make  the  as¬ 
sumption  that  the  limit  in  the  definition 
of  is  finite.  This  assumption  of  in¬ 

evitable  termination  allows  us  to  consider 
the  infinite  horizon  DSN  sensor  management 
problem  as  a  finite  (albeit  random-length) 
horizon  problem,  where  the  length  of  the 
horizon  is  affected  by  the  policy  being  im¬ 
plemented.  The  transformation  from  an  in¬ 
finite  to  finite  horizon  problem  allows  us  to 
rephrase  the  cost  function  as  follows: 


Wi 


'Tn 

.t=o 


(14) 


where  we  are  totalling  operating  costs  from 
stage  t  =  0  until  stage  Tq,  the  stage  at  which 
we  first  reach  a  termination  state.  The  as¬ 
sumption  of  inevitable  termination  implies 
that  Tq  is  finite  with  probability  one  and 
Equation  14  is  well  defined. 

DP  Algorithm.  The  DP  algorithm  for  a 
finite  horizon  problem  with  fixed  time  hori¬ 
zon  T  reaching  state  xt  at  stage  Tq  is  de¬ 
fined  via  the  following  equations: 


Jt{xt)  =  gri^x),  (15) 


the  infimum  is  not  attained,  we  will  use  the 
term  “min”  for  convenience  of  notation. 

On  an  infinite  time  horizon,  using  a  total 
cost  criterion,  the  optimal  cost  J*{xo)  for 
every  state  xq  is  equal  to  Jo{xo)  given  by 
the  last  step  of  the  preceding  DP  algorithm, 
which  proceeds  backward  in  time  from  stage 
Ta  to  stage  0.  The  following  limiting  form  of 
the  DP  algorithm  should  hold  for  all  states 

X, 


r(x) 


min  Eu,  [p(x,  u,  w) 
ueu{x) 


+  ./*(/(*.“, w))]- 


(17) 


The  expression  above  is  a  functional  equa¬ 
tion  for  the  cost-to-go  function  J*,  and  is 
called  Bellman’s  equation  (Bertsekas  2000). 
Bellman’s  equation  is .  really  a  system  of 
equations,  one  for  each  state,  where  the  opti¬ 
mal  expected  cost-to-go  from  state  Xt  is  cou¬ 
pled  to  the  optimal  expected  cost-to-go  from 
neighboring  states  Xt+i.  In  (17),  the  term 
g(x,u,w)  denotes  transition  costs  incurred 
by  transitioning  from  state  Xt  to  state  Xt+i 
and  the  term  J*{f{x,  u,  w))  denotes  the  op¬ 
timal  cost-to-go  from  state  Xt+\- 

Thus,  to  act  optimally  at  state  x,  we  need 
to  choose  action  u  that  minimizes 


Jt{xt)  =  min  E^Agt{xuUt,vJt) 

uteUt{xt) 

t  =  o,i,...,r-i  (16) 

Note  that  in  (16),  “min”  denotes  the  great¬ 
est  lower  bound  (or  infimum)  of  the  set  of 
expectations  over  set  Ut  E  U.  Even  when 


Ey,  [g{x,  u,  w)  +  J*if{x,  It,  w))] .  (18) 

In  doing  so,  we  choose  actions  that  minimize 
the  expected  cost  of  a  single  transition  plus 
the  optimal  expected  long-term  cost  from 
the  next  state  Zt+i.  This  will  allow  optimal 
actions  to  weigh  both  the  short-term  and 
long-term  costs  from  state  Xt.  The  optimiza¬ 
tion  of  this  system  of  equations  is  nonlinear, 
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since  we  minimize  with  respect  to  u  at  each 
stage. 

In  very  simple  problems  it  is  possible  to  ob¬ 
tain  a  closed-form  solution  to  the  DP  algo¬ 
rithm  presented  above,  but  these  cases  tend 
to  be  in  the  minority.  For  most  realistic 
problems,  it  is  necessary  to  numerically  solve 
the  DP  equations  to  obtain  an  optimal  pol¬ 
icy.  For  large-scale  problems  like  the  DSN 
sensor  management  problem,  both  the  state 
and  control  spaces  are  very  large.  In  these 
cases,  the  “curse  of  dimensionality”  makes  it 
difficult  to  compute  J*  and  hence  infeasible 
to  attempt  a  complete  solution  of  the  prob¬ 
lem  by  DP.  One  approach  to  address  this 
problem  is  to  use  various  ADP  techniques 
to  approximate  J*  and  then  use  the  approx¬ 
imation  to  construct  a  workable  policy  fx. 

Solution  Approximation.  Policy  iter¬ 
ation  entails  starting  with  an  initial  pol¬ 
icy,  evaluating  the  policy  (policy  evaluation 
step),  and  then  deriving  an  improved  policy 
(pohcy  improvement  step)  (Bertsekas  and 
Tsitsiklis  1996).  We  decided  to  use  a  single- 
stage  lookahead  rollout  policy  to  execute  the 
policy  iteration. 

Rollout  Policy.  For  our  policy  evaluation 
step,  we  use  Q-factors.  These  Q-factors  con¬ 
sist  of  a  state-control  pair  (a:t,Ut)  and  a  sta¬ 
tionary  policy  fi,  defined  as 


Qtiipti  ^t) 


ast+i 


" - V - ^ 

^  transition  cost 


cost-to-go’^ 


(19) 


The  Q-factor  denotes  the  expected  cost 
corresponding  to  starting  at  state  Xt,  using 


control  tit  at  the  first  stage,  and  using  the 
stationary  policy  at  the  second  and  subse¬ 
quent  stages. 

In  order  to  conduct  the  policy  improve¬ 
ment  step  at  each  stage  t,  we  select  ac¬ 
tion  [x{xt),  which  represents  the  action  cor¬ 
responding  to  the  minimum  value  Q-factor, 
using  the  equation: 


/i(Xi)  = 


min 

ueU{xt) 


ut). 


(20) 


However,  as  was  the  case  in  trying  to  de¬ 
velop  a  closed-form,  exact  solution  to  the  DP 
algorithm,  we  again  nm  into  problems  try¬ 
ing  to  develop  a  closed-form,  exact  solution 
to  equation  (19).  Since  we  have  very  large 
state  and  control  spaces,  we  speed  up  the 
calculation  of  the  rollout  policy,  but  accept 
some  potential  performance  degradation,  by 
identifying  a  subset  U(xt)  of  promising  con¬ 
trol  actions  to  evaluate,  rather  than  the  full 
set  of  possible'  control  actions. 

Q-Factor  Approximation.  In  our  imple¬ 
mentation  of  the  single-stage  lookahead  roll¬ 
out  pohcy,  we  approximate  the  Q-factors  by 
using  simulation  to  approximate  both  the 
transition  costs  gt{xt,  Ut,  Wt)  and  the  cost-to- 
go  functions  J*{ft{xt,Ut,Wt))  in  (19). 

For  our  stationary  pohcy  p  in  (19)  we  use 
the  Base  Policy.  That  is,  we  execute  the 
Base  Pohcy  pbp  at  the  second  and  subse¬ 
quent  stages  of  our  rollout  pohcy.  Even 
though  the  Base  PoUcy  is  not  optimal,  it  is 
useful  since  it  is  expected  to  always  readi 
termination  quicker  than  any  other  pohcy. 

We  then  approximate  the  Q-factor  for  each 
candidate  control  action  ut  by  conducting  K 
simulations  of  a  transition  from  state  Xt  un¬ 
der  test  to  state  Xt+i  and  then  all  subse¬ 
quent  transitions  from  state  it+i  under  the 
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Base  Policy  fiBP  to  termination. 

We  then  use  the  vector  of  transition  costs,  jL{xt)  =  min  Qfi^p{xt,ut)  (24) 

uteu(xt) 


\^9t{xuUt,  xt+i,i),9t{xt,  Ut,  Xt+1.2),  •  •  • , 

(21) 

and  the  vector  of  cost-to-go’s, 


i„,(4<+l,K)}  (22) 

to  obtain  an  approximation  of  the  Q-factor, 
as  follows: 


The  expected  cost  accumulated  during 
eadi  such  trajectory  is  one  estimate  of  the  Q- 
factor.  By  simulating  a  large  enough  num¬ 
ber  of  such  trajectories,  K,  we  can  obtain 
an  accurate  approximation  of  the  Q-factor, 
by  averaging  the  results  of  all 
the  trajectories. 

Finally,  comparing  these  (Q-factor  approx¬ 
imations,  we  implement  the  candidate  con¬ 
trol  action  fl{xt)  corresponding  to  the  mini¬ 
mum  value  Q-factor,  or  in  other  words, 


Results  and  Analysis 

In  this  section  we  describe  the  Policy  Com¬ 
parison  Analysis  we  conducted  to  evaluate 
various  DSN  sensor  management  policies 
and  the  Target  Comparison  Analysis  we  con¬ 
ducted  to  examine  the  impact  of  target  loca¬ 
tion  on  system  performance.  During  the  Pol¬ 
icy  Comparison  Analysis,  we  compared  var¬ 
ious  dynamic  control  policies  with  the  Base 
Policy  and  the  SNOOPS  Policy,  while  hold¬ 
ing  system  parameters  constant.  During  the 
Target  Comparison  Analysis,  we  compared 
various  target  configurations  to  determine 
how  changes  in  target  location  affected  the 
ability  of  the  model  to  reach  termination  and 
the  related  operating  costs. 

During  the  simulation  process,  it  was  pos¬ 
sible  to  collect  a  number  of  numerical  mea¬ 
sures  of  performance  (MOPs)  with  which  to 
evaluate  the  performance  of  the  individual 
DSN  sensor  management  policies.  The  com¬ 
parison  involved  several  key  metrics,  includ¬ 
ing:  Total  Cost,  Number  of  Stages  required 
to  reach  termination.  Computation  Time, 
and  Success  Rate. 

Total  Cost  represents  the  cumulative  DSN 
operating  costs  recjuired  to  reach  termina¬ 
tion.  The  values  presented  represent  the  to¬ 
tal  units  of  cost  to  reach  termination,  both 
sensing  costs  and  transmission  costs.  The 
Number  of  Stages  required  to  reach  termina¬ 
tion  is  self-explanatory.  Computation  Time 
represents  the  amount  of  time  required  to 
complete  the  simiilation  rim  and  should  only 
be  used  to  compare  the  magnitude  of  per¬ 
formance  rather  than  specific  performance 
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values.  The  values  presented  represent  the 
number  of  seconds  of  computation  time  re¬ 
quired  to  execute  the  simulation  run.  Suc¬ 
cess  Rate  represents  the  percentage  of  the 
replications  that  resulted  in  identifying  the 
correct  target  location  upon  reaching  termi¬ 
nation. 

Policy  Comparison  Anedysis.  We  con¬ 
ducted  a  Policy  Comparison  Analysis  to 
compare  the  performance  of  several  heuris¬ 
tic  dynamic  control  policies  with  the  Base 
Policy  and  the  SNOOPS  Policy. 

Design  of  Experiments.  For  this  analysis, 
we  used  a  repeated  measures  design.  We 
conducted  two  iterations  for  the  analysis, 
one  for  Type  I  problems  and  the  other  for 
Type  II  problems,  so  that  we  could  compare 
model  performance  between  the  two  process¬ 
ing  techniques. 

The  subjects  consisted  of  500  random  tar¬ 
get  locations  for  the  Type  I  problem  iterar 
tion  and  1000  random  target  locations  for 
the  Type  II  problem  iteration.  These  target 
locations  were  randomly  selected  based  on 
the  prior  probability  distribution. 

The  treatments  consisted  of  seven  differ¬ 
ent  DSN  sensor  management  policies,  each 
conducted  using  the  same  set  of  sensor  per¬ 
formance  characteristics  and  model  parame¬ 
ters  (termination  tolerance  7  =  0.1  and  false 
alarm  rate  (FAR)  =  0.2)  but  each  with  a 
different  random  number  seed.  The  seven 
treatments  are  described  below: 

•  SNOOPS  Policy:  Use  the  simulation  - 
based  policy  iteration  ADP  technique. 

•  Base  Policy:  Activate  all  sensors  in  the 
DSN. 


•  Action  1:  Search  the  10  cells  with  the 
highest  xi  values,  using  the  single  best 
sensor  for  each  cell. 

•  Action  2:  Search  the  20  cells  with  the 
highest  x\  values,  using  the  single  best 
sensor  for  each  cell. 

•  Action  3:  Search  the  30  cells  with  the 
highest  xl  values,  using  the  single  best 
sensor  for  each  cell. 

•  Action  4:  Search  the  20  cells  with  the 
lowest  xl  values,  using  the  single  best 
sensor  for  each  cell. 

•  Action  5:  Search  the  10  cells  with  the 
highest  xi  values,  using  the  two  best 
sensors  for  each  cell. 

We  expected  that  the  Base  Policy  and 
SNOOPS  Policy  would  provide  an  upper 
and  lower  bound  on  expected  total  operat¬ 
ing  costs.  The  Base  Policy  should  provide 
an  upper  bound  since  it  is  expected  to  incur 
the  greatest  cost  during  each  sensing  itera¬ 
tion.  The  SNOOPS  Policy  should  provide  a 
lower  bound  since  the  simulation-based  pol¬ 
icy  iteration  technique  is  designed  to  select 
the  control  action  from  the  set  U{xt)  that  is 
“best”  for  each  sensing  iteration. 

Type  I  Problem  Results.  The  Total  Cost  re¬ 
sults  for  the  Type  I  problem  simulation  rtms 
are  depicted  graphically  in  Figure  10.  The 
figure  represents  the  95  percent  confidence 
intervals  for  the  mean  total  cost,  calculated 
over  500  replications,  each  using  a  different 
random  number  seed. 

Conducting  ANOVA  and  using  the  Scheffe 
procedure  at  the  0.05  percent  significance 
level,  we  determined  that  there  was  no  sig¬ 
nificant  difference  between  Action  5  and  the 
SNOOPS  Policy.  However,  these  two  poh- 
cies  were  significantly  better  than  the  group 
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Policy  Comparison.-  Type  i  Problem 


Acton  1  Acton  2  Acfon  3  Aden  4  Aidw  5 


Figure  10:  Type  I  Problem  Results 

consisting  of  Action  3,  Action  1,  and  Ac¬ 
tion  2,  which  were  not  significantly  different 
from  eadi  other.  This  second  group  was  sig¬ 
nificantly  better  than  the  group  consisting 
of  the  Base  Policy  and  Action  4,  which  were 
not  significantly  different  from  each  other. 

Type  II  Problem  Results.  The  Total  Cost 
results  for  the  Type  II  problem  simulation 
runs  are  depicted  graphiqally  in  Figure  11. 
The  figure  represents  the  95  percent  confi¬ 
dence  intervals  for  the  mean  total  cost,  cal¬ 
culated  over  1000  replications,  each  using  a 
different  random  number  seed. 
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Figure  11:  Type  II  Problem  Results 

Conducting  ANOVA  and  using  the  Scheffe 
procedure  at  the  0.05  percent  significance 


level,  we  determined  that  there  was  no  sig¬ 
nificant  difference  between  Action  5,  Action 
1,  and  the  SNOOPS  Policy.  However,  these 
three  policies  were  significantly  better  than 
Action  2,  which  was  significantly  better  than 
Action  3.  This  policy  was  in  turn  signifi¬ 
cantly  better  than  the  group  consisting  of 
the  Base  Policy  and  Action  4,  which  were 
not  significantly  different  from  each  other. 

Target  Comparison  Analysis.  We  con¬ 
ducted  a  Target  Comparison  Analysis  to 
compare  the  performance  of  several  heuris¬ 
tic  dynamic  control  policies  with  the  Base 
Policy  and  the  SNOOPS  Policy  for  specific 
target  locations  in  order  to  determine  the 
impact  of  target  location  on  system  perfor¬ 
mance. 

Design  of  Experiments.  For  this  analysis, 
we  used  a  repeated  measures  design,  just  as 
for  the  previous  analyses.  We  conducted  two 
iterations  for  the  analysis,  one  for  Type  I 
problems  and  the  other  for  T3^e  II  prob¬ 
lems,  so  that  we  could  compare  model  per¬ 
formance  between  the  two  processing  tech¬ 
niques. 

The  subjects  consisted  of  50  iterations  for 
three  target  locations.  To  deterxnine  appro¬ 
priate  target  locations,  we  reviewed  the  in¬ 
dividual  results  from  the  Policy  Comparison 
Analysis.  We  found  that  target  locations  fell 
into  one  of  three  categories:  “easy”,  “aver¬ 
age”,  and  “difficult”.  For  example,  the  four 
most  difficult  targets  to  detect  and  local¬ 
ize  (i.e.,  resulting  in  the  highest  total  costs) 
were  Targets  6,  14,  35,  and  47,  as  depicted 
in  Figure  12. 

We  decided  to  choose  one  of  these  “diffi¬ 
cult”  targets,  (Target  47),  an  “average”  tar¬ 
get  (Target  1),  and  am  “easy”  target  (Target 
33)  as  the  subjects  for  the  Target  Compari¬ 
son  Analysis. 
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Figure  12:  Target  Location  Difl&culty 

We  used  the  same  seven  treatments  as  for 
the  Policy  Comparison  Analysis,  each  con¬ 
ducted  using  the  same  set  of  sensor  perfor¬ 
mance  characteristics  and  model  parameters 
(7  =  0.1,  and  FAR  =  0.2)  but  each  with  a 
different  random  number  seed. 

Type  I  Problem  Results.  The  Total  Cost 
results  for  the  Type  I  problem  simulations 
are  depicted  graphically  in  Figures  13,  14, 
and  15.  Each  figure  depicts  the  95  percent 
confidence  intervals  for  the  mean  total  cost, 
calculated  over  50  replications,  each  using  a 
different  random  number  seed. 
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Figure  13:  Type  I  Problem  Results  -  Total 
Cost  (Target  1) 

Conducting  ANOVA  and  using  the  Schefife 


procedure  at  the  0.05  percent  significance 
level,  we  determined  that  for  Target  1  there 
was  no  significant  difference  between  the 
SNOOPS  Policy,  Action  1,  Action  5,  Action 
2,  and  Action  3.  However,  this  group  was 
significantly  better  than  the  group  consist¬ 
ing  of  the  Base  Policy  and  Action  4,  which 
were  not  significantly  different  from  each 
other. 
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Figure  14:  Type  I  Problem  Results  -  Total 
Cost  (Target  33) 

Conducting  the  same  analysis  for  Target 
33,  we  found  similar  findings.  There  was  no 
significant  difference  between  the  SNOOPS 
Policy,  Action  1,  Action  5,  Action  2,  and  Ac¬ 
tion  3.  However,  this  group  was  significantly 
better  than  the  group  consisting  of  the  Base 
Pohcy  and  Action  4,  which  were  not  signifi¬ 
cantly  different  from  each  other. 

Conducting  the  same  analysis  for  Target 
47,  we  found  that  there  was  no  signifi¬ 
cant  difference  between  the  Base  Policy,  the 
SNOOPS  Policy,  Action  4,  Action  5,  Action 
2,  and  Action  3.  This  group  was  significantly 
better  than  Action  1. 

Type  II  Problem  Results.  The  Total  Cost 
results  for  the  Type  I  problem  simulations 
are  depicted  graphically  in  Figures  16,  17, 
and  18.  Each  figure  depicts  the  95  percent 
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Figure  15:  Type  I  Problem  Results  -  Total  Figure  17:  lype  II  Problem  Results  -  Total 
Cost  (Target  47)  Cost  (Target  33) 


confidence  intervals  for  the  mean  total  cost, 
each  using  a  different  random  number  seed. 
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Figure  16:  Type  II  Problem  Results  -  Total 
Cost  (Target  1) 

Conducting  ANOVA  and  using  the  Scheffe 
procedure  at  the  0.05  percent  significance 
level,  we  determined  that  for  Target  1,  there 
was  no  significant  difference  between  Action 
1,  the  SNOOPS  Policy,  Action  5,  Action  2, 
and  Action  3.  However,  this  group  was  sig¬ 
nificantly  better  than  the  Base  Policy,  which 
was  significantly  better  than  Action  4. 

Conducting  the  same  analysis  for  Target 
33,  we  found  that  there  was  no  significant 
difference  between  Action  1,  Action  2,  the 


SNOOPS  PoHcy,  Action  3,  and  Action  5. 
However,  this  group  was  significantly  better 
than  the  group  consisting  of  the  Base  Policy 
and  Action  4,  which  were  not  significantly 
different  from  eadi  other. 
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Figure  18:  Type  II  Problem  Results  -  Total 
Cost  (Target  47) 

Conducting  the  same  analysis  for  Target 
47,  we  found  that  Action  5  was  signifi¬ 
cantly  better  than  the  group  consisting  of 
the  SNOOPS  Policy  and  the  Base  Policy, 
whidi  were  not  significantly  different  from 
each  other.  This  group  was  significantly  bet¬ 
ter  than  Action  1,  which  was  significantly 
better  than  the  group  consisting  of  Action 
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4,  Action  2,  and  Action  3,  which  were  not 
significantly  different  from  each  other. 

Conclusions 

In  this  paper,  we  have  described  our  efforts 
to  develop  a  model  that  can  provide  system- 
level  management  of  a  DSN.  The  goal  of 
our  research  effort  was  to  develop  a  model 
that  could  identify  a  sensor  control  strategy 
that  could  accomplish  the  sensing  mission 
while  reducing  resource  usage  compared  to 
the  Base  Policy  of  activating  all  sensors. 

After  reviewing  the  data,  we  are  confi¬ 
dent  that  our  ADP  approach  is  feasible 
for  generating  efficient  DSN  sensor  man¬ 
agement  strategies  for  complex,  large-scale 
DSNs.  The  sensor  control  strategy  recom¬ 
mended  by  our  model  was  more  efficient 
than  the  Base  Policy,  requiring  far  less  bat¬ 
tery  power  to  accomplish  the  same  sensing 
mission.  For  Type  I  problems,  the  SNOOPS 
Policy  used  31  percent  less  battery  power 
than  the  Base  Policy  and  for  Type  II  prob¬ 
lems,  the  SNOOPS  Policy  used  47  percent 
less  battery  power. 

A  comparison  of  the  Base  Policy  with  the 
SNOOPS  Policy  with  all  parameters  held 
constant  and  the  same  target  location  is  pre¬ 
sented  in  Figure  19.  While  this  comparison 
represents  only  a  single  instance,  it  is  rep¬ 
resentative  of  the  expected  performance  of 
both  policies,  based  on  the  data. 

In  the  specific  instance  represented  by  this 
figure,  both  the  Base  Policy  and  SNOOPS 
Policy  were  able  to  successfully  locate  the 
target.  As  expected,  the  Base  Policy  was 
quicker,  taking  only  four  stages  versus  six 
stages.  However,  again  as  expected,  the 
Base  Policy  was  more  costly,  Consuming  175 
units  of  power  whereas  the  SNOOPS  Policy 
only  consumed  37  units.  In  this  case,  the 


Base  Policy  SNOOPS  Policy 


Figure  19:  Base  Policy  vs.  SNOOPS  Policy 

SNOOPS  Policy  was  more  efficient  than  the 
Base  Policy. 

Another  important  outcome  of  our  re¬ 
search  is  the  insight  that  simple  dynamic 
control  policies  based  on  the  underlying 
conditional  probability  distribution  can  be 
nearly  as  effective  as  the  SNOOPS  Policy 
while  incurring  a  significantly  lower  compu¬ 
tational  burden. 

Total  Cost.  These  results  indicate  that 
the  SNOOPS  Policy  indeed  outperforms  the 
Base  Policy.  As  expected,  the  Base  Policy 
generally  provides  an  upper  bound  on  ex¬ 
pected  total  operating  costs.  For  both  Type 
I  and  Type  II  problems,  the  variability  of  the 
results  make  it  difficult  to  identify  any  signif¬ 
icant  difference  between  the  SNOOPS  Policy 
and  Actions  1,  2,  3,  and  5.  These  results  in¬ 
dicate  that  these  four  dynamic  control  poli¬ 
cies  perform  comparably  with  the  SNOOPS 
Policy  in  terms  of  total  cost. 

Number  of  Stages.  These  results  were 
consistent  for  both  Tfype  I  and  Type  II  prob- 
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lems.  In  both  cases,  the  Base  Pohcy  required 
the  fewest  number  of  stages  to  reach  a  termi¬ 
nation  state.  For  both  Type  I  and  Type  II 
problems,  the  SNOOPS  Policy  and  Actions 
2,  3,  and  5  were  comparable  in  terms  of  how 
long  it  took  to  reach  a  termination  state, 
generally  two  to  three  times  longer  than  the 
Base  Policy. 

Computation  Time.  The  general  trend 
is  that  the  SNOOPS  Policy  takes  much 
longer  than  the  Base  Policy  or  any  of  the 
dynamic  control  policies.  In  fact,  the  com¬ 
putation  time  was  on  average  about  80  per¬ 
cent  more  for  the  SNOOPS  Policy  than  for 
the  Base  Policy.  This  is  expected  since  there 
is  a  large  amount  of  simulation  required  to 
execute  the  simulation-based  policy  itera¬ 
tion  process  that  serves  as  the  core  of  the 
SNOOPS  Policy. 

In  comparing  Type  I  and  Type  II  prob¬ 
lem  computation  times,  we  observed  that 
the  Type  I  problems  took  longer.  This  was 
a  result  of  the  assumption  that  there  was 
a  single  target  in  the  search  region,  requir¬ 
ing  the  update  of  the  conditional  probability 
for  each  cell  in  the  search  region  with  every 
execution  of  the  Bayes  update  process.  The 
Type  II  problems  only  updated  observed  cell 
conditional  probability  distributions. 

Note  that  for  the  Type  I  problems  the  re¬ 
quired  computation  time  for  the  SNOOPS 
Policy  averaged  about  14  minutes,  much  too 
long  to  allow  for  real-time  execution  in  a  real 
world  application.  This  was  not  a  problem 
for  the  Type  II  problems  since  the  average 
computation  time  was  under  a  minute. 

Success  Rate.  For  Type  I  problems,  each 
policy  resulted  in  a  100  percent  success  rate 
in  reaching  the  correct  termination  state, 
with  the  lone  exception  of  Policy  1  for  Target 
47,  where  the  success  rate  was  96  percent. 


For  Type  II  problems,  every  policy  exceeded 
an  82  percent  success  rate  in  reaching  the 
correct  termination  state.  The  target  that 
appeared  to  provide  the  most  difficulty  in 
terms  of  this  measure  was  Target  1. 

One  source  for  the  difference  in  results  be¬ 
tween  the  Type  I  and  Type  II  problems 
could  be  the  requirement  to  update  every 
cell  for  the  Type  I  problem.  In  this  case, 
each  observation  provides  more  information 
since  even  the  conditional  probabihties  for 
unobserved  cells  get  updated.  In  the  T3q)e  II 
problems  it  appears  that  the  system  is  more 
likely  to  settle  into  an  incorrect  termination 
region,  much  akin  to  reaching  a  local  opti¬ 
mum  versus  a  global  optimum. 

SNOOPS  Selection  Rate.  For  Type  I 
problems.  Action  3  was  selected  the  most  of¬ 
ten.  Action  2  was  selected  the  next  most  fre¬ 
quently,  followed  by  Action  1,  Action  5,  and 
Action  4.  The  obvious  outlier  was  Action  4, 
which  was  selected  almost  half  as  often  as 
the  other  policies. 

For  Type  II  problems.  Action  5  was  se¬ 
lected  the  most  often.  Action  1  was  selected 
the  next  most  frequently,  followed  by  Action 
3,  Action  2,  and  Action  4.  The  obvious  out¬ 
lier  was  Action  4,  which  again  was  selected 
almost  half  as  often  as  the  other  poUcies. 

Impact  of  Target  Location.  For  both 
Type  I  and  T3q)e  II  problems,  the  DSN  had 
difficulty  in  detecting  Target  47  although  the 
magnitude  of  difficulty  was  much  higher  for 
the  Type  I  problem.  It  turns  out  that  Tar¬ 
get  47  is  almost  on  top  of  a  sensor,  but  stiU 
within  range  of  several  other  sensors.  Inter¬ 
estingly,  this  is  also  the  case  for  the  other 
identified  “difficult”  targets  in  Figure  12: 
Targets  6,  14,  and  35.  It  appears  that  this, 
combination  of  features  prevents  the  DSN 
from  quicMy  isolating  the  target  location 
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axid  reaching  a  termination  state. 

Future  Research 

This  work  establishes  a  sound  foundation 
for  continued  research  into  the  DSN  sen¬ 
sor  management  problem.  However,  there 
is  a  tremendous  amount  of  work  remaining 
in  this  area  as  we  continue  to  improve  the 
SNOOPS  simulation  model  to  make  it  more 
robust  and  versatile.  In  addition,  there  are 
possible  extensions  of  this  work  to  address 
other  sensor  issues  as  well  as  other  research 
areas. 

SNOOPS  Improvements.  While 
SNOOPS  is  already  a  fairly  capable  DSN 
simulation  model,  we  have  already  identified 
a  number  of  improvements  that  will  extend 
its  capabilities,  to  include: 

Moving  targets.  Implement  a  “Target 
Movement  Filter”  or  “Latency  Filter”  to  ex¬ 
tend  results  to  moving  targets. 

Multiple  targets.  Conduct  simulations 
with  multiple  targets  to  determine  if  the  sin¬ 
gle  target  results  continue  to  apply.  The 
structure  is  already  present  in  SNOOPS  but 
has  yet  to  be  exploited. 

Disparate  sensors.  Conduct  simulations 
with  disparate  sensors  to  examine  the  bene¬ 
fit  of  sensor  collaboration.  The  structure  is 
already  present  in  SNOOPS  but  has  yet  to 
be  exploited. 

ADP  mechanism.  Investigate  alternative 
Candidate  Actions  for  tJt.  Ity  various  im¬ 
provements  to  the  cost-to-go  approximation 
proems,  to  include  increasing  from  a  one- 
step  lookahead  to  a  multiple-step  lookahead. 

Cluster  assignment.  Examine  a  dynamic 
cluster  assignment  capability. 


Cost  structure.  Examine  the  effect  of  in¬ 
troducing  costs  to  account  for  the  number  of 
stages  required  to  reach  termination.  Inves¬ 
tigate  the  impact  of  a  “Number  of  Stages” 
penalty  cost  to  encourage  quicker  termina¬ 
tion  while  still  trying  to  minimize  operating 
costs.  This  could  be  accomplished  by  imple¬ 
menting  a  weighting  mechanism  in  the  op¬ 
timization  process.  Investigate  the  impact 
of  a  phased  cost  function,  where  the  cost  of 
using  a  sensor  increases  as  it  approaches  the 
end  of  it’s  battery  life.  This  approach  could 
be  expected  to  better  preserve  the  average 
battery  power  within  the  network  by  avoid¬ 
ing  the  expenditure  of  all  the  use  available 
from  a  single  sensor. 

Increased  realism.  Implement  terrain,  veg¬ 
etation,  and  weather  filters  to  repr^ent  the 
impact  of  these  features  on  both  observation 
and  communication. 

Extensions  to  Other  Sensor  Issues. 
The  SNOOPS  model  provides  a  new  capa¬ 
bility  to  examine  critical  aspects  of  sensor 
fusion  and  DSN  sensor  management.  Possi¬ 
ble  uses  of  the  model  include: 

Sensor  placement.  Identify  “optimal”  sen¬ 
sor  locations  by  implementing  various  sensor 
placement  schemes  as  static,  stationary  sen¬ 
sor  usage  policies.  Simulating  each  of  these 
policies  can  help  determine  which  policy  (or 
sensor  location  scheme)  provides  the  best  re¬ 
sults. 

Sensor  mix.  Evaluate  various  sensor  net¬ 
work  compositions  to  gain  insights  into  sen¬ 
sor  mix  issues. 

Sensor  fusion.  Derive  insights  into  the 
fusion  of  observations  from  different  sen¬ 
sor  types.  Investigate  the  ability  of  non¬ 
imaging  sensors  to  provide  adequate  situa¬ 
tional  awareness  where  “precision”  emplace- 
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ment  of  more  capable  sensors  is  not  possible. 

DSN  operational  concepts.  Develop  opera¬ 
tional  concepts  to  better  integrate  DSN  op¬ 
erations  with  user  needs. 
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